home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The X-Philes (2nd Revision)
/
The X-Philes Number 1 (1995).iso
/
xphiles
/
hp48hor2
/
grep.doc
< prev
next >
Wrap
Text File
|
1995-03-31
|
18KB
|
514 lines
GREP: A UNIX-like text-search utility; better than "FIND" in MS-DOS.
úÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
3 Mini-instructions: Type "GREP ?" at the 3
3 DOS prompt for the grep help screen. 3
àÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄù
Used by the Goodies Disk FETCH batch file. The following documentation
is included for those who wish to use GREP by itself for their own
purposes. I suggest putting it in your DOS directory and using it as
an "upgrade" for DOS's "FIND" command. -jkh-
Portions of this documentation are copyright (c) 1990 Borland
International. All rights reserved. Used with permission.
úÄÄÄÄÄÄÄÄÄÄ¿
3 OVERVIEW 3
àÄÄÄÄÄÄÄÄÄÄù
GREP (Global Regular Expression Print) is a powerful text-search
program derived from the UNIX utility of the same name. GREP searches
for a text pattern in one or more files or in its standard input
stream.
Here's a quick example of a situation where you might want to use
GREP. Suppose you wanted to find out which text files in your current
directory contained the string "Elisabeth." You would issue the
command
grep Elisabeth *.txt
and GREP would respond with a list of the lines in each file (if any)
that contained the string "Elisabeth." (4DOS users can pipe the
output to LIST/S for convenient perusal and/or printing.)
Note: the strings "elisabeth" and "ELISABETH" would *not* be
considered matches. If you *do* want them to be matches, use
grep -i Elisabeth *.txt
and the "-i" option (which means "-ignore case") would force grep to
treat uppercase and lowercase the same. More about this and other
options later.
GREP can do a lot more than match a single, fixed string. In the
section that follows, you'll see how to make GREP search for any
string that matches a particular pattern.
úÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
3 Command-line syntax 3
àÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄù
The general command-line syntax for GREP is
grep [options] searchstring [filespec ... ]
options consist of one or more letters, preceded by a hyphen (-),
that let you change various aspects of GREP's behavior.
searchstring gives the pattern to search for.
filespec (a list of file specifications) tells GREP which files to
search. (If no file is specified, GREP searches its standard input;
this lets you use GREP with pipes and redirection.)
In addition, the command
GREP ?
prints a brief help screen showing GREP's command-line options,
special characters, and defaults. (See the description of the -u
command-line option for information on how to change GREP's
defaults.)
úÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
3 GREP options 3
àÄÄÄÄÄÄÄÄÄÄÄÄÄÄù
In the command line, options are one or more single characters
preceded by a hyphen (-). Each individual character is a switch
that you can turn on or off: A plus symbol (+) after a character
turns the option on; a hyphen (-) after the character turns the
option off.
The default is on; for example, -r means the same thing as -r+. You
can list multiple options individually (like this: -i -d -l), or
you can combine them (like this: -ild or -il, -d, and so on); it's
all the same to GREP.
Here are the GREP option characters and their meanings:
Option Meaning
______ ____________________________________________________
-c Count only: Prints only a count of matching lines.
For each file that contains at least one matching
line, GREP prints the file name and a count of the
number of matching lines. Matching lines are not
printed.
-d Directories: For each filespec specified on the
command line, GREP searches for all files that
match the file specification, both in the directory
specified and in all subdirectories below the
specified directory. If you give a filespec without
a path, GREP assumes the files are in the current
directory.
-i Ignore case: GREP ignores upper/lowercase
differences (case folding). GREP treats all letters
a to z as identical to the corresponding letters A
to Z in all situations.
-l List match files: Prints only the name of each file
containing a match. After GREP finds a match, it
prints the file name and processing immediately
moves on to the next file.
-n Numbers: Each matching line that GREP prints is
preceded by its line number.
-o UNIX output format: Changes the output format of
matching lines to support more easily the UNIX
style of command-line piping. All lines of output
are preceded by the name of the file that contained
the matching line.
-r Regular expression search: The text defined by
searchstring is treated as a regular expression
instead of as a literal string. This option is on
by default. [Note to non-UNIX people: don't panic;
"regular expression" is defined below. -jkh-]
-u Update options: GREP will combine the options given
on the command line with its default options and
write these to the GREP.COM file as the new
defaults. (In other words, GREP is self-
configuring.) This option allows you to tailor the
default option settings to your own taste. If you
want to see what the defaults are in a particular
copy of GREP.COM, type
GREP ?
at the DOS prompt. Each option on the help screen
will be followed by a + or a - depending on its
default setting. [See note below about using the
-u option on a compressed GREP.COM. -jkh-]
-v Nonmatch: Prints only nonmatching lines. Only lines
that do not contain the search string are
considered to be nonmatching lines.
-w Word search: Text found that matches the regular
expression is considered a match only if the
character immediately preceding and following
cannot be part of a word. The default word
character set includes A to Z, 0 to 9, and the
underscore ( _ ).
An alternate form of this option lets you specify
the set of legal word characters. Its form is
-w[set], where set is any valid regular expression
set definition (see below).
If you define the set with alphabetic characters,
it is automatically defined to contain both the
uppercase and lowercase values for each letter in
the set (regardless of how it is typed), even if
the search is case-sensitive. If you use the -w
option in combination with the -u option, the new
set of legal characters is saved as the default
set.
-z Verbose: GREP prints the file name of every file
searched. Each matching line is preceded by its
line number. A count of matching lines in each file
is given, even if the count is zero.
úÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
3 Order of precedence 3
àÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄù
Remember that each of GREP's options is a switch: Its state
reflects the way you last set it. At any given time, each option
can only be on or off. Each occurrence of a given option on the
command line overrides its previous definition. Given this command
line,
grep -r -i- -d -i -r- main( my*.c
GREP runs with the -d option on, the -i option on, and the -r
option off. The initial "-r" is lost due to the subsequent "-r-",
and so forth.
You can install your preferred default setting for each option in
GREP.COM with the -u option. For example, if you want GREP to
default to a verbose search (-z on), you can install it with the
following command:
grep -u -z
Use GREP ? to check the currently set defaults.
[Note well: The -u option actually modifies the GREP.COM file itself.
Therefore, do NOT use the -u option if you've compressed GREP.COM
with LZEXE or any other "program packer". If you have compressed
GREP, and wish to customize it with the -u option, you must first
unpack GREP to its original form (or recopy it from the Goodies
Disk), use the -u option, and then compress it again. -jkh-]
úÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
3 The search string 3
àÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄù
To use GREP well, you'll need to become proficient at writing
search strings. The value of searchstring defines the pattern GREP
searches for. A search string can be either a regular expression or
a literal string.
In a regular expression, certain characters have special meanings:
They are operators that govern the search.
In a literal string, there are no operators: Each character is
treated literally.
You can enclose the search string in quotation marks to prevent
spaces and tabs from being treated as delimiters. The text matched
by the search string cannot cross line boundaries; that is, all the
text necessary to match the pattern must be on a single line.
A regular expression is either a single character or a set of
characters enclosed in brackets. A concatenation of regular
expressions is a regular expression.
úÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
3 Operators in regular expressions 3
àÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄù
When you use the -r option (on by default), the search string is
treated as a regular expression (not a literal expression). The
following characters take on special meanings:
Option Meaning
______ ____________________________________________________
^ A circumflex at the start of the expression matches
the start of a line. E.g. "^hyde" only finds those
lines that *begin* with "hyde".
$ A dollar sign at the end of the expression matches
the end of a line. E.g. "?$" only finds those lines
that *end* with a question mark.
. A period matches any character (single wildcard).
* An expression followed by an asterisk wildcard
matches ZERO OR MORE occurrences of that
expression. For example, in "to*", the * operates on
the expression o; it matches t, to, too, etc. (t
followed by zero or more os), but doesn't match ta.
+ An expression followed by a plus sign matches ONE
OR MORE occurrences of that expression: to+ matches
to, too, etc., but not t.
[ ] A string enclosed in brackets matches any character
in that string, but no others. If the first
character in the brackets is a circumflex (^), the
expression matches any character EXCEPT the
characters in the string.
For example, [xyz] matches x, y, or z, while [^xyz]
matches everything but x, y, and z. You can
specify a range of characters with two characters
separated by a hyphen (-). These can be combined to
form expressions (like [a-bd-z?], which matches the
? character and any lowercase letter except c).
\ The backslash escape character tells GREP to search
for the literal character that follows it. For
example, "\." matches a period instead of "any
character." The backslash can be used to quote
itself; that is, you can use \\ to indicate a
literal backslash character in a GREP expression.
Note:
Four of the "special" characters ($, ., *, and +) don't have any
special meaning when used within a bracketed set. In addition, the
character ^ is only treated specially if it immediately follows the
beginning of the set definition (immediately after the "["
delimiter).
Any ordinary character not mentioned in the preceding list matches
that character (> matches >, # matches #, and so on).
úÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
3 File specifications 3
àÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄù
filespec tells GREP which files (or groups of files) to search.
filespec can be an explicit file name, or a "generic" file name
incorporating the DOS ? and * wildcards. In addition, you can enter
a path (drive and directory information) as part of filespec. If
you give filespec without a path, GREP searches the current
directory.
If you don't specify any file specifications, input to GREP must
come from redirection (<) or a pipe (|).
úÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
3 Some GREP examples 3
àÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄù
The following examples show how to combine GREP's features to do
different kinds of searches. They assume GREP's default settings
are unchanged.
úÄÄÄÄÄÄÄÄÄÄÄ¿
3 Example 1 3
àÄÄÄÄÄÄÄÄÄÄÄù
The search string here tells GREP to search for the word "main" with
no preceding lowercase letters ([^a-z]), followed by zero or more
occurrences of blank spaces (\ *), then a left parenthesis.
Since spaces and tabs are normally considered to be command-line
delimiters, you must quote them if you want to include them as part
of a regular expression. In this case, the space after "main" is
quoted with the backslash escape character. You could also
accomplish this by placing the space in double quotes.
Command line:
grep -r [^a*z]main\ *( *.c
Matches: main(i:integer)
main(i,j:integer)
if (main ()) halt;
Does not match:
mymain()
MAIN(i:integer);
Files searched:
*.C in current directory.
úÄÄÄÄÄÄÄÄÄÄÄ¿
3 Example 2 3
àÄÄÄÄÄÄÄÄÄÄÄù
Because the backslash (\) and period (.) characters usually have
special meaning in path and file names, you must place
the backslash escape character immediately in front of them if
you want to search for them. The -i option is used here, so
the search is not case sensitive.
Command line:
grep -ri [a*c]:\\data\.fil *.c *.inc
Matches: A:\data.fil
c:\Data.Fil
B:\DATA.FIL
Does not match:
d:\data.fil
a:data.fil
Files searched:
*.C and *.INC in current directory.
úÄÄÄÄÄÄÄÄÄÄÄ¿
3 Example 3 3
àÄÄÄÄÄÄÄÄÄÄÄù
This format basically defines how to search for a given word.
Command line:
grep -ri [^a*z]word[^a*z] *.doc
Matches: every new word must be on a new line.
MY WORD!
word--smallest unit of speech.
In the beginning there was the WORD, and the WORD
Does not match:
Each file has at least 2000 words.
He misspells toward as toword.
Files searched:
*.DOC in the current directory.
úÄÄÄÄÄÄÄÄÄÄÄ¿
3 Example 4 3
àÄÄÄÄÄÄÄÄÄÄÄù
This format defines a basic "word" search.
Command line:
grep -iw word *.doc
Matches: every new word must be on a new line. However,
MY WORD!
word: smallest unit of speech which conveys
In the beginning there was the WORD, and
Does not match:
each document contains at least 2000 words!
He seems to continually misspell "toward" as "toword."
Files searched:
*.DOC in the current directory.
úÄÄÄÄÄÄÄÄÄÄÄ¿
3 Example 5 3
àÄÄÄÄÄÄÄÄÄÄÄù
This is an example of how to search for a string with embedded spaces.
Command line:
grep "search string with spaces" *.doc *.c a:\work\myfile.*
Matches: This is a search string with spaces in it.
Does not match:
THIS IS A SEARCH STRING WITH SPACES IN IT.
This search string has spaces in it, too.
Files searched:
*.DOC and *.C in the current directory, and MYFILE.* in a
directory called \WORK on drive A.
úÄÄÄÄÄÄÄÄÄÄÄ¿
3 Example 6 3
àÄÄÄÄÄÄÄÄÄÄÄù
This example searches for any one of the characters " . : ? ' and ,
at the end of a line.
The double quote within the range is preceded by an escape character,
so it is treated as a normal character instead of as the ending quote
for the string. Also, the $ character appears outside of the quoted
string. This demonstrates how regular expressions can be concatenated
to form a longer expression.
Command line:
grep -rd "[ ,.:?'\"]"$ \*.doc
Matches: He said hi to me.
Where are you going?
In anticipation of a unique situation,
Examples include the following:
"Many men smoke, but fu man chu."
Does not match:
He said "Hi" to me
Where are you going? I'm headed to the
Files searched:
*.DOC in the root directory and all its subdirectories on
the current drive.
úÄÄÄÄÄÄÄÄÄÄÄ¿
3 Example 7 3
àÄÄÄÄÄÄÄÄÄÄÄù
This example ignores case and just prints the names of any files
that contain at least one match. The three command-line examples
show different ways of specifying multiple options.
Command line:
grep -ild " the " \*.doc
or
grep -i -l -d " the " \*.doc
or
grep -il -d " the " \*.doc
Matches: Anyway, this is the time we have
do you think? The main reason we are
Does not match:
He said "Hi" to me just when I
Where are you going? I'll bet you're headed
Files searched:
*.DOC in the root directory and all its subdirectories on
the current drive.
úÄÄÄÄÄÄÄÄÄÄÄ¿
3 Example 8 3
àÄÄÄÄÄÄÄÄÄÄÄù
This example redefines the current set of legal characters for a word
as the assignment operator (=) only, then does a word search. It matches
C assignment statements, which use a single equal sign (=), but not
equality tests, which use a double equal sign (==).
Command line:
grep -w[=] = *.c
Matches: i = 5;
j=5;
i += j;
Does not match:
if (i == t) j++;
/* ======================= */
Files searched:
*.C in the current directory.